Homepage

The Lambda Calculus Mail Series

Contents

This page is just a mail series (with some editing) that I started while I was trying to read up on lambda calculus from various websites and papers. This, I think, will make nice reading to anybody interested in the subject. But bear in mind that this was just done for fun, and an experiment in a group of friends using mails to learn something new.

A lot of things could be missing, wrong or stupid, please let me know if this is the case.

Series on Lambda Calculus

I am ambitiuously planning to shed my laziness and write a series on Lambda Calculus. I plan to write little mails everyday to all of you, slowly explaining lambda calculus. I would recommend this practice, more than sending links, because, I think (and hope), such a thing would be more interactive and productive. I hope you guys will actively participate, and ask questions and tell stuff. Also, reading little mails, and thinking about them gives a better understanding, and a better motivation to read and learn, than sending one huge article, written by me or anyone else. I wish, each of you will pick some topic and start such a series on your own. Let me know about your plans in this regard. Maybe, we could have a collection of such articles, and put them on our web page! One other thing, we notice, when we read papers and articles, is that we get stuck somewhere midway, and after that point don't really understand anything. But when we plan such series, we can work together in understanding it. We also have an idea of what we understand and what we don't, so if I see something where, I think we'll have to struggle to understand, I, having undertood it, will try to elaborate and explain in easy terms, so all of us do understand. Also, I plan to keep the mails real small, addressing little bits and pieces everytime, so that it is easier to digest.

Tell me how you like the idea and if you have any suggestion. Also, start thinking about doing something yourself too. In fact, I think its even OK, to send a huge article (somebody else's) on a subject, bit by bit everyday (just copy and paste), as long as we all read it and discuss it, its worth the time and the effort (or so I think).

Lets hope all this works out well, and reaches its completion.

Table of contents

Lets Start At The Very Beginning

Since this is the first in the series, I won't jump right onto the techie stuff. Today lets just see where the story of lambda calculus fits into the bigger picture of Mathematics and Computer Science. Its best to know what we are studying and what role it plays, before jumping into details. Today's episode is therefore introductory, and be warned that it is long! Later on, I will stick to small mails as promised.

When we think of lambda calculus, we think of it as that thing the theory types use, when they discuss programming languages. But wait! Lambda calculus is much more fundamental than that! Lambda calculus was invented long before there were any computers (in the modern sense, as programmable machines), or computer languages. Alonzo Church invented lambda calculus as a machinery to formalize set theory.

You can't get any more fundamental than that! This is hardcore low level math! Set theory is what Russel and Whitehead used, in their attempt to express the entire field of mathematics as a logical theory (derive the entire mathematics from a set of axioms).

They derived the concept of numbers and arithmetic from this. Lambda calculus is considered a stroke of genious. A notation to intended to formalize this very set theory (it got into the same paradoxes that naive set theory fell into, but that is another story). Now we know what we are dealing with!

But then where does computer science fit in? The set of functions that are definable in lambda calculus are exactly those functions that are computable! In fact, lambda calculus and Turing Machines have the same expressive power. Also, they were invented (or conceptualized, or whatever), at just about the same time. Church and Turing even worked together after that in Princeton, before the World War II broke out and Alan had to go. Lambda calculus is the backbone of computer science and computability. There are people who say, that the various notions derived from the study of lambda calculus have been more useful than the Turing Machine has ever been (but well, the point is moot). But no one questions the value that lambda calculus has in computer science. But I must tell you that, lambda calculus and computer languages were first related only when Peter Landin userd it to give the semantics of Algol 60.

Here, are some points that I have pinched from a paper that give you an idea of how important it is.

  1. Variable binding and scoping in block­structured languages can be modelled.
  2. Several function calling mechanisms --- call by­ name, call­ by ­value, and call by­need can be modelled. The latter two are also known as strict evaluation and lazy evaluation.
  3. The calculus is Turing universal, and is probably the most natural model of computation. Church's Thesis asserts that the `computable' functions are pre­cisely those that can be represented in the calculus.
  4. All the usual data structures of functional programming, including infinite lists, can be represented. Computation on infinite objects can be defined formally in the calculus.
  5. Its notions of confluence (Church­Rosser property), termination, and normal form apply generally in rewriting theory.
  6. Lisp, one of the first major programming languages, was inspired by the lambda­ calculus. Many functional languages, such as ML, consist of little more than the lambda calculus with additional syntax (of course they do have really advanced type systems, and in that way have gone a long way from the untypes lambda calculus of Church).
  7. The two main implementation methods, the SECD machine (for strict evalua­tion) and combinator reduction (for lazy evaluation) exploit properties of the lambda calculus.
  8. The --calculus and its extensions can be used to develop better type systems, such as polymorphism, and to investigate theoretical issues such as program synthesis.
  9. Denotational semantics, which is an important method for formally specifyingprogramming languages, employs the -­calculus for its notation.
Now for all those wondering what the hell is "calculus", here is what Mariam Webster has to say about it:

  1. a : a method of computation or calculation in a special notation (as of logic or symbolic logic)>
    b: the mathematical methods comprising differential and integral calculus
  2. CALCULATION
  3. a : a concretion usually of mineral salts around organic material found especially in hollow organs or ducts
    b : 1TARTAR 2
  4. a system or arrangement of intricate or interrelated parts
The meanings, 1.a, 1.b and 4 seem relevant here.

Okay, I'll let you off with a last little bit of chronological history:

Around 1924 Schonfinkel developed a simple theory of functions. In 1934, Alonzo Church introduced lambda calculus and used it to develop a formal set theory, which turned out to be inconsistent. More successfully, he used it to formalize the syntax of Principia Mathematica (Whitehead and Russel). In the 1940s, haskell B. Curry introduced combinatory logic, a variable free theory of functions. More recently, Roger Hindley developed what is known as type inference. Robert Milner extended it to develop the polymorphic type system of ML, and published a proof that a well typed program cannot suffer a run-time type error (gotta investigate this, somebody take it up). Dana Scott developed models of the lambda calculus. With his "domain theory", he and Christopher strachey introduced denotational semantics.

Peter Landin used the lambda calculus to analyze Algol-60, and introduce ISWIM as a framework for future languages (this series might dedicate a substantial portion to ISWIM too). HIs SECD machine, with extensions, was used to implement ML andother strict functional languages (strict languages can be thought of as those, where arguments are evaluated before calling a function). Christopher Wadsworth developed graph reduction as a method for performing lazy evaluation with lambda expressions (this is the basis of implementations of lazy languages like Haskell). David turner applied graph reduction to combinators, which led to efficient implementations of lazy languages.

Next we'll be jumping right on to lambda calculus.

P.S.

  1. Comments/criticisms/questions are very much welcome!
  2. Yes, yes, the mails in the future will definitely be shorter.

Acknowledgements:
Lots of it from "Foundations of Functional Programming' by Lawrence Paulson.

Questions and Discussions

Feedback

As mentioned above,
lambda calculus = machinery for formalising set theory.
set theory = derive mathematics from a set of axioms
How does one express some mathematical concept using set theory i.e. how are the axioms expressed ? Are they in form of some notations or what ? My basic doubt is what does the term formalization means ? What are the benefits of formalization (standard way of expressing something ?, could be understood my a machine ? )

Reply

Ok, great question(s)! I will try to answer some of this, but let it be more a discussion kind of thing than a question answer, since my answers will by no means be accurate since I am no expert either! So anybody has any more ideas, or questions, go ahead and write. I just went through some sites on set theory, and I'll try to put forth the history of set theory, and lets see if that tells us more about set theory, and its formalization. The other question about what is formalization, and why do it - we'll tackle later. What I give next is mainly stolen from ory.html. I pruposely chose a historical explanation because I think that will give a better idea. But note, that it is heavily edited by me. I have added my comments in square brackets. Read on....

The history of set theory is rather different from the history of most other areas of mathematics. For most areas a long process can usually be traced in which ideas evolve until an ultimate flash of inspiration, often by a number of mathematicians almost simultaneously, produces a discovery of major importance. Set theory however is rather different. It is the creation of one person, Georg Cantor. Before we take up the main story of Cantor's development of the theory, we first examine some early contributions. The idea of infinity had been the subject of deep thought from the time of the Greeks. In 1847 Bolzano, considered sets with the following definition "an embodiment of the idea or concept which we conceive when we regard the arrangement of its parts as a matter of indifference." [God knows what this means! -S]

Bolzano defended the concept of an infinite set. At this time many believed that infinite sets could not exist. Bolzano gave examples to show that, unlike for finite sets, the elements of an infinite set could be put in 1-1 correspondence with elements of one of its proper subsets. This idea eventually came to be used in the definition of a finite set. It was with Cantor's work however that set theory came to be put on a proper mathematical basis. Cantor moved from number theory to papers on trigonometric series. These papers contain Cantor's first ideas on set theory and also important results on irrational numbers.

In 1874 Cantor published an article in Crelle's Journal which marks the birth of set theory.In his 1874 paper Cantor considers at least two different kinds of infinity. Before this orders of infinity did not exist but all infinite collections were considered 'the same size'. Cantor shows that the real numbers cannot be put into 1-1 correspondence with the natural numbers. [So there are different degrees of infinity, real numbers being the higher degree -S]. Cantor then introduces the idea of equivalence of sets and says two sets are equivalent or have the same power if they can be put in 1-1 correspondence.

[Lots of people criticized such "useless" mathematics. Infinities! Transcendental numbers! All crap! - S], Cantor claimed that mathematics is quite free and any concepts may be introduced subject only to the condition that they are free of contradiction and defined in terms of previously accepted concepts.
[Lot of interesting stuff cut off by me. -S]
The 'ultimate' paradox was found by Russell in 1902 [For more, see - S]. It simplify defined a set A, "of all such sets, that are not members of themselves".

Russell then asked :- Is A an element of A? Both the assumption that A is a member of A and A is not a member of A lead to a contradiction. The set construction itself appears to give a paradox. [Think a lot about this, and see to it that you do understand the paradox. Also, if Russell's paradox is in a system - "X is a member" and "X is not a member" are both true, in theory, then any statement whatsoever can be proved! And this is absurd! For more detail, visit the link above.- S]

When Russell told this to Frege, Frege said, A scientist can hardly meet with anything more undesirable than to have the foundation give way just as the work is finished. In this position I was put by a letter from Mr Bertrand Russell as the work was nearly through the press. [Mathematicians were really very worried about this paradox, and they started wondering which assumption (axiom) in their theory caused this. As they got more and more worried, parallely they were looking for axioms that give the power of set theory without running into paradoxes. These were the various attempts to axiomatize the set theory. Also, they were scared that some intuitive unknow assumption should not be causing all these problems. So they planned to have rigorous methods, and "formalize" the entire theory. So there are no pitfalls due to play of the human mind (*very* important), and also see whether different choices of axioms and notations lead to any progress. Also, once when you formalize a system, and play around with the rules, you find many extremely surprising unintuitive results, which you'll not find unless you forget all the "meanings" and just play around with the symbols. Hope you get an idea of how formalization helps, why it began. I'll stop here, otherwise the mail will get bigger and you guys won't read it at all! -S]

Feedback

OK ...as u said ....GIVE IT A LOT OF THOUGHT...to understand the RUSSELs Paradox part....i DID.... but i really m not sure if i have understood it properly...thoguh i have got some idea as to what its trying to say... nad one more thing..what do u mean by "Also, once when you formalize a system, and play around with the rules, you find many extremely surprising unintuitive results, which you'll not find unless you forget all the "meanings" and just play around with the symbols." I mean ...is this an advantage or disadvantage of formalization???

Reply

Ok, I guess I'll send in more stuff about the Russell's Paradox. I'll just copy it from, http://www.mcn.net/~jimloy/russell.html.

Before I go ahead, some stuff from me. Cantor's set theory, was hailed as a great work and very successful, and if this were true, and if entire mathematics could be derived from it, everything could be done by a machine! So mathematicians are no longer required! But then came Russell's Paradox which showed that Set Theory is inconsistent.

This is just the background story, so that you know what you are going to read next. BTW, then came Kurt Godel and shook all the three worlds, by saying, every logical theory must be either inconsistent or incomplete! Anybody interested, investigate that, and send in stuff. But we'll stick to Russell's paradox for the nonce.

Table of contents

Lambda Calculus - The First Glimpse

As the subject says, today we take our first glimpse at the creature that is lambda calculus. At first sight, it'll seem remarkable cute and simple giving no hint whatsoever of the enormous power that lies within. Ok, so lets define a lambda expression. (I use a backslash for lambda)

DEFINITION:
Any symbol x is a lambda expression.
If M is a lambda expression, so is \x.M (This is called lambda abstraction. Here, M is called the body).
If M and N are lambda expressions, so is M N.

That's it! So lets write a few lambda expressions:
x, \x.x, \x.\y.y, \x.z (\y.x), \x.\y.\z.x y z
Okay, we get the idea.

This style of definition is not new to us. We can call this a "structural" definition - basically, it just defines the structure that lambda expressions have. The definition is also recursive (see lambda abstraction above, for example). We have seen such a definition say, in definition of regular expressions.

Any symbol x is a regular expression.
If M is a regular expression, so is M*
If M and N are regular expressions, so is M N.

But this does not finish the definition. We aslo have
x is a regular expression for the language {x}
If M is a regular expression for language L, M* is for the kleene closure of L.
If M and N are regular expressions for languages L and L', language-of(MN ) is concatenation of strings in L and L' (forgot how they write this!).

The three lines above, give the meaning for regular expressions. But this is missing for lambda calculus! There are no such lines that explain the meaning! And this is important. The concept of regular expressions depends on the concept of languages. So the "meaning" of regular expressions is given in terms of languages. But lambda calculus is intended to be so fundamental, that everything else is derived from that! So there can be no such lines giving such "meanings". For the lambda calculus, the definition is it! But then whats the use? Well, the meaning is reflected in the relations and "conversions" between lambda expressions. And these "conversion" rules, may be added as part of the definition. More about this next time. But it is important to understand why the definition of lambda calculus contains no lines that explain the meaning of each form (of the structure) of lambda expressions.

Table of contents

Digression - Russell's Paradox

Part I

Russell's Paradox
© Copyright 2000, Jim Loy

Let you tell me a famous story:

There was once a barber. Some say that he lived in Seville. Wherever he lived, all of the men in this town either shaved themselves or were shaved by the barber. And the barber only shaved the men who did not shave themselves.

That is a nice story. But it raises the question: Did the barber shave himself? Let's say that he did shave himself. But we see from the story that he shaved only the men in town who did not shave themselves. Therefore, he did not shave himself. But we again see in the story that every man in town either shaved himself or was shaved by the barber. So he did shave himself. We have a contradiction. What does that mean?

Maybe it means that the barber lived outside of town. That would be a loophole, except that the story says that he did live in the town, maybe in Seville. Maybe it means that the barber was a woman. Another loophole, except that the story calls the barber "he." So that doesn't work. Maybe there were men who neither shaved themselves nor were shaved by the barber. Nope, the story says, "All of the men in this town either shaved themselves or were shaved by the barber." Maybe there were men who shaved themselves AND were shaved by the barber. After all, "either ... or" is a little ambiguous. But the story goes on to say, "The barber only shaved the men who did not shave themselves." So that doesn't work either. Often, when the above story is told, one of these last two loopholes is left open. So I had to be careful, when I wrote down the story.

Now we come to a really serious attempt to solve the above puzzle: Maybe there was no barber like the one described in the story. But the story said, "There was once a barber..." So there really was a barber like that, unless the story is a lie! That is the answer, isn't it? The story is a lie. Sorry about that. I told the story of a barber who could not possibly exist. I had good motives. But I guess I told a lie.

In logic, some statements are true (Jim is nearsighted), some are false (Jim eats squash). And a collection of statements, such as our story, is either consistent or inconsistent. The following pair of statements is inconsistent:

  1. Jim likes vanilla ice cream with Smuckers Plum Jam on it!
  2. . Jim does not like vanilla ice cream with Smuckers Plum Jam on it.
They contradict one another. They cannot both be true. In fact, one of them is really really false. Well, our story of the barber is inconsistent. In logic, we don't say that it is a lie. We say that it is inconsistent. "Inconsistent" is much more descriptive, and it is not a sin.

The above story about the barber is the popular version of Russell's Paradox. The story was originally told by Bertrand Russell. And of course it has a simple solution. It is inconsistent. But the story is not really that simple. The story is a retelling of a problem in set theory.

In set theory, we have sets, collections of objects. These objects may be real physical objects (marbles) or not (cartoon characters, thoughts, or numbers). When we deal with a set, we normally write it down with brackets: {A, B, C}. That set contains three letters, A, B, and C. The set {B,C} is a subset of {A, B, C}. There is a special set with no elements, the empty set {} or ø, as the set of humans bigger than the earth, or the set of odd numbers divisible by two. Some sets contain infinitely many elements, as the set of all even numbers.

A set can contain sets. The set {{A, B, C}, {x, y}} contains two sets {A, B, C} and {x, y}. It also contains the empty set, by the way. All sets contain the empty set. We can define the set of all sets. This set contains {A, B, C} and {{A, B, C}, {x, y}} and every other possible set. Some sets contain themselves. The set of all red marbles does not contain itself, because it contains no sets at all, only marbles. Let's say that S is a set which contains S and {A, B}. Then this is S: {S, {A, B}}. It contains two sets, itself and {A, B}. The set of all sets obviously contains itself. Well, let's construct a very interesting set, the set of all sets which do not contain themselves.

There is something wrong here. Does "the set of all sets which do not contain themselves" sound like "the barber who shaves all men who do not shave themselves?" The story of the barber was inconsistent. The set of all sets which do not contain themselves is inconsistent for the same reason. Does the set of all sets which do not contain themselves actually contain itself, or not? If it contains itself, then it cannot contain itself. If it does not contain itself, then it must contain itself. It is inconsistent.

But where did we go wrong? Let's make some lists. A list is a special kind of set. But we know what a list is. A list may be clearer in our minds than a set. We cannot actually physically make infinite lists. But we can certainly define some of them, like the list of all even numbers. So we can deal with infinite lists. We can also make lists of lists. Here is such a list:

  1. My shopping list
  2. . My email address list
  3. . David Letterman's list of Top Ten Whatevers

This list of lists is real. Now, if we allow infinite lists, then it is no stretch at all to produce the list of all lists, and even the list of all lists which do not contain themselves. And that list is inconsistent.

Well, maybe there are no infinite lists. There are infinite sets, for example, the set of all even numbers. And that is a list: the list of all even numbers. The concept of an infinite list is actually fairly simple.

So we have an inconsistent set. That is not all. We made no mistakes when we constructed the set of all sets which do not contain themselves. And that means that set theory is inconsistent. And that means that logic is inconsistent. And that means that all of mathematics, including algebra and geometry, is inconsistent.

It doesn't invalidate mathematics or logic or set theory. The Pythagorean theorem is still true. But there is some doubt. Kurt Godel (Gödel) proved that Number Theory (and by identical arguments, every branch of mathematics) is inconsistent. He converted Russell's Paradox, the set version, into a statement in Number Theory, and showed that Number Theory is inconsistent. This had huge repercussions in the world of mathematics. All of this leads to the following problem:

  1. . There are things that are true in mathematics (based on basic assumptions).
  2. There are things that are false.
  3. . There are things that are true that can never be proved.
  4. . There are things that are false that can never be disproved. And that is a problem, because we cannot ever tell if something is true unless we can prove it.

Part II

Well, we've got a taste of Russell's paradox. So this is a kind of a concluding part by me. If anybody has more, we might continue. Lets see what happens if a proposition P and its opposite ~P can be proved in a system. This is relevant because in a way we can prove Russell's set A is a member of A, as well as A is not a member of A. Now, if P and ~P can both be proved, it is called inconsistency. And if this happens we can prove any proposition whatsoever! Say we have to prove Q. Now, since we can prove P, we can prove (P or Q). But we can also prove ~P. So if (P or Q) is True, and ~P, then Q must be True. So we can prove any arbitrary Q, and phut goes our logical theory!

Thats why people run away from inconsistencies. To solve this problem various approaches have been taken. The most successful (and useful) being type theory. There are various type systems but we talk about the general concept of type theory. So what do we do? We define type rules, and only those expressions that confirm with the type rules are valid in the system. Now take care that we make a type system in which "a set of all sets that or not members of themselves" is not type correct. So this sentence cannot be derived at all! Does that decrease the expressive power? Well, yes. But thats ok, because most of the useful things that make sense and cause no big problems, still are type correct. Interested people can look for more on this.

Right now, this is all, we return back to our lambda calculus..

Table of contents

Enter Substitution

Though the formal definition of lambda calculus has no "meanings" associated with it, people do have things in mind when they create logical models. Lambda calculus was created to model the notion of functions that mathematics has. Therefore, \x.M, is supposed to model f(x), where f(x) = M. This is called lambda abstraction - where a function is being defined. The other form of lambda expression M N is supposed to model function application (M is the function and N is the argument).

Now how, to capture this in lambda calculus? I order to do this, lambda calculus terms have have "conversions", where one form of a lambda expression can be converted to another. There are three such things - alpha conversion, beta conversion, ita conversion. Lets get to these conversions later. Just bear in mind that there is such a thing, and we are going to define what exactly each conversion is, ad this will help in imparting our intuitive meaning to lambda calculus.

Now in maths, if f(x) = x + x, aand we want f(2), what do we have? 2 + 2. That is, we put 2 wherever there is x. Simple enough! This is called "substitution" (as if we didn't know!). Here's the notation: M [N/x] means "substitute N for x, in M". Some examples to get used to the notation:
(a b c) [c/p] -> abp
(\x.m n) [n/ y ] -> \x.m y
Well, you get the idea. This substitution is a key operation in lambda calculus. But there are a few problems:

Consider (\x.y). Intuitively, this is a constant function, that returns y irrespective of x.

Now, (\x.y)[y/x] creates (\x.x), which is the identity function, which very much depends on x! Yuk, we don't want this! This'll make substitution very ugly and complicated, and we'll have to be very careful everytime we substitute! Substitution should not change the very meaning! One constant function to another constant function using substitution - ok. But this is too much! BTW, this phenomenon is called "variable capture" and we have to avoid it!

Now that you have an idea of the subtleties of substitution, later we'll define it so that there'll be no such problems. On the way, we'll analyze why the problem occurs, and that'll lead us to the notion of bound variables and free variable.

Questions and discussions

Feedback

\x.M models f(x) = M
but above is an example, of some meaning being attached to lambda calculus to model mathematical functions. depending upon the system to model a new meaning could be attached. am i right ?

Reply

Exactly! So "meaning" is a way used to think about and develop the calculus, but the calculus itself does not have any meaning. Yes, any meaning could be attached depending on the system to model. It is the responsibility of the one who attaches the meaning, to see whether it fits in well, is useful. But actually, logical theories like lambda calculus start with some model in mind (and not the other way round, usually). And then once all the axioms and rules are written, and the creator is satisfied, then its a play of symbols, and frequent mapping it back to the "meanings" universe, to see and evaluate the results.

Feedback

Does the uppercase / lowercase letters in lambda expressions have some significance ?.

Reply

Well, its actually matter of notation and convention, but right now, as we define it, lets say, upper case letters are not used as symbols, at all! But note, above that you wrote "7". And you have in mind the meaning of "7" as a natural number. Simpilarly for "+". But "7" and "+" are not valid symbols in lambda calculus. In fact, lambda calculus, is more fundamental than arithmetic and notion of natural numbers. So we will have to "derive" the notion of natural numbers in lambda calculus, by finding a representation for them in lambda calculus. But lets not leapfrog to that point yet. All that is still to come, and will come. But its good to remember that we are going to do that. (If we can do that, we'll be successful in formalizing arithmetic! Did you say something Godel?)

Feedback

doubt: (\x. x) [x/y] creates (\x. y) (substituting only after . (dot) ) is this also variable capture ?
OR
(\x. x) [x/y] creates (\y. y) (substituting every occurance )
and this is no problem ?

Reply

The first one is wrong - if we substitute, we substitute all occurances.

The second one is not a problem. In fact, this is called alpha conversion, but lets get to that later.

Problem is in cases like (\x.y)[y/x] -> (\x.x)

Look, we want to use substitution as a primitive operation for lambda calculus, we don't want to be so powerful (and ugly), that by some inadvertance, we can change the very nature of an expression (in the above example, from an expression that models a constant function, to one that models an identity function). So we plan to define substitution so that such problems do not occur.

Feedback

Does the substitution work for the whole Lamda expression or only to the expression in parenthesis to its left? In short, is there scoping for substitutions?

Reply

In all examples that I have given the paranthesis in the left is the whole lambda expression! Anyways, whenever the entire lambda expression to the left of substitution brackets, undergoes substitution.

Table of contents

Free and Bound Variables

Last time we just had a first look at substitution and indeed, substituion is a basic building block kind of operation in lambda calculus. But then there is the "variable capture" problem.

So our aim now, is to define substitution while at the same time, eliminating the variable capture problem. For this, the notion of free variables and bound variables is very useful. So we go right ahead and define free and bound variables, so as to be well prepared, and tackle substitution..

The notion of free and bound variables is not unique to languages or lambda calculus. They are found all over mathematics. For example, take the summation notation (sigma), where say, i = 1 to i = n, sigma n^2, here clearly, "i" is a bound variable. Well, the notion is simple enough (but useful and important nevertheless!), so lets go straight ahead and define them for lambda calculus.

Bound Variables: Let BV(M) be the set of bound varaibles of a lambda expression M. BV is defined as (based on the form of M),
BV(x) = {} /* no bound variables */
BV(\x.N) = {x} union BV(N) /* all the bound varables in the "body" N, and x is also bound by the lambda */
BV(M N) = BV(M) union BV(N) /* all the bound variables that occur in M and N */

Free Variables: Let FV(M) be the set of bound varaibles of a lambda expression M. BV is defined as (based on the form of M),
FV = {x} /* x is free! */
FV(\x.M) = FV(M) - {x} /* all free variables in M but not x! (its bound by the lambda */
FV(M N) = F(M) union FV(N) /* all the free ables that occur in M and N */
Thats it! Simple!

There's one thing to note though. See how these definitions are based on the structure or form of the lambda expression. In the definition of lambda expressions, there are three forms of lambda expressions. These definitions of free and bound variables give one line for each of the three forms. Such things are called "structural induction" or some such thing in math. Actually its easy. Say for example we want the number of (abstractions (i.e. lambdas) + number of applications) in an expression. How would you write it (or program it)?

It'll be a case analysis on structure of lambda expressions. Lets write it a function foo that does this:

switch(form of expression L)
{
case x:
return 0;

case (\x.M):
return 1 + foo(M);

case (M N):
return 1 + foo(M)+ foo(N);

}

Simple! You want just the number of lambdas in an expression? Thats easy too, and will go along the same lines (exercise for the interested reader?)!

Now we have all the ammunition to tackle substitution - lets do that next.

Table of contents

Getting a feel for Free and Bound Variables

This time, we take another look at the notion of free and bound variables, in case the definitions give last time, didn't give you the intuituive feel.

As we have already seen, \x.M models a function that has x as a parameter and body M. So x is "bound" in \x.M. That is, if you had say, (N (\x.M)), and both M and N contain x, the "x" in M is different from the "x" in N, (since x has a binding "\x" which extends into M). Hope that clears it - I could give a proper explanation with function applications, but I would be going far to ahead into beta reduction.

Consider (\x.x y (\y. x y)).
Now, if you consider the inner subexpression "(\y.x y)", y is bound and x is free. But in the entire subexpression x is bound. Also, y is free in its left most occurance, but bound in that inner subexpressions. Get it? That is, the first y is "different" from the second.

Hope this clarifies the notions of free and bound variables in lambda calculus. We also have seen the exact definition. So we may move on to a precise, problem-free definition of substitution.

Questions and Discussions

Feedback

When you say "x is bound in the entire subexpression", do you mean that it's bound if you take the the subexpression as first evaluating (\y.x y) and then applying (\x.x y) to this result, or that it is bound in the left subexpression only.

In other words, is x bound in (\x.x y) or in the whole of (\x.x y (\y.x y))? If it is bound in the whole subexpression, isn't y bound in it too?

Reply

Both occurences of x are bound - the "shadow" of a lambda ("\") falls as far right as it can. So even the second x falls under the shadow of the leftmost lambda.

Now, the first y is totally free. There ain't no shadow of no \y (lambda y). But the second occurence of the y is bound by "\y" that comes before it.

Table of contents

Back To Substitution

Now that we know what bound and free variables are, we get back to substitution. I didn't mention it before, but I should have.

If we have M[N/y] (subsitute N for y in M), only free occurences of y in M are substituted. Its pretty understandable why this is so. (If in doubt, ask.)

Now the problem we were talking about was in cases like (\y.x)[y/x] And this makes the constant function (\y.x) into an identity function (\y.y), and we don't want this to happen. So next, we look at when is substitution safe.

Consider, M[N/x].

  1. We do not have to worry about the bound variables in N, because even after substitution they are bound and no undesirable change in intuitive meaning happens.
    eg. (\y.x y)[(\y.y) / x] leads to (\y.(\y.y) y) but this is ok, since the two "y"s have different bindings.
    Therefore, we only have to worry about the free variables in N.
  2. Now suppose there is an overlap between free variables of N and free variables of M - thats ok too! eg. (\y. x y) [(x z) / x] leads to (\y. (x z) y). This if fine, no big qualitative change in meaning has occured.
  3. Now, suppose there is an overlap between free variables of N and bound variables of M. This won't do!
    eg. (\y. x y) [y / x] leads to (\y. y y). Nah, won't do at all - this is variable capture! This is the kind of ugly thing that'll change a constant function into an identity function just because of substitution!
So, what do we have now? Lets state it :
Substitution M[N/x] is safe as long as the bound variables of M are disjoint from free variables of N.

Now, what do we do when this happens? Simple. We can just go ahead and rename the bound variable - (\y.x y) [y/x] can now be made into (\z.y z) Look, renaming a bound varaible simply doesn't matter (actually this is called alpha conversion. We'll come back to this soon)!

Ok, lets wrap up by defining substitution

Definition: M [L/y], the result of substituting L for all free occurences of y in M, is given by,
x [L/y] = L if x = y // simple enough, if its the variable you want to substitute, substitute it!
x [L/y]= x otherwise // if its some other variable, forget it!

(\x.M)[L/y] = (\x.M) if x == y // if the variable you want to subsitute is bound in the entire expression, chuck it.
(\x.M)[L/y]= (\x.M[L/y]) otherwise // well, there may be free occurence in M, so go ahead and substitute

(M N)[L/y] = (M [L/y] N [L/y]) // look for free occurences in N and M, and do the needful.
(And then there's that, "rename the bound variable in M, if required" as discussed above).

Table of contents

Discussions and FeedBack

Feedback

To elaborate the line
Substitution M[N/x] is safe as long as the bound variables of M are disjoint from free variables of N.

M[N/y] (substitue N for each free occurance of y)

M N effect free(M) = free(N) then no problem.

free(M) = bound(N) then no problem (not much change in meaning, boundness in N will be local)

bound(M) = free(N) then problem (some free variables will now be bonded)

bound(M) = bound(N) then no problem ( boundness in N will be local)

Feedback

In an earlier message...

doubt: (\x. x) [x/y] creates (\x. y) (substituting only after . (dot) ) is this also variable capture ?

The reply given then
"This is wrong. If we substitute, we substitute all occurances."

...and today...

If we have M[N/y] (subsitute N for y in M), only free occurences of y in M are substituted. Its pretty understandable why this is so. (If in doubt, ask.)

Contradiction?

Reply

Oh ok. Not really much a contradiction, but a incorrect explanation by me .

About (\x.x)[x/y] creating (\x.y), I should have said, this is wrong, we do not substitute bound variables at all (and not "all occurences are substituted"), and thats why this is wrong. But we still can have, \x.(x[y/x]) This'll give (\x.y). But this is ok. Because substitution did not change the nature of the expression it acted upon. Substitution acted upon only x and changed it to y. The fact, that there was a \x outside, does not matter (the substitution cannot see it!). All we want, is that substitution must not change the nature of the expression it acts upon. Now, if one deliberately wants to do the substitution as given above (for whatever evil purpose that he has) thats ok. So all we want to say is that after substitution, tthe nature of a lambda expression (which underwent substitution) remains the same. Hope I have been clear.

Note that such an ugly instance of substitution will never come up as we unroll the definition of substitution over the different forms of lambda expressions. Such a thing will only come up when explicitly written like that (hope you understood the above lines! read it again!)

Now, about,

"If we have M[N/y] (subsitute N for y in M), only free occurences of y in M are substituted. Its pretty understandable why this is so. (If in doubt, ask.)"

I think, all of you have understood this statement. The only problem was that it seemed to contradict with the first one, which, I hope, I have cleared up.

If the shadow of doubt, lingers or as they say, looms, (which I think it does) , feel free to keep asking.

Table of contents

Enter Conversions

We move on ahead to those things called "conversions", which let you work with lambda expressions. Conversions are basically, rules of transforming the lambda expressions, so that the transformed expressions in some sense are "equivalent" to the original expression. This "equivalence after conversions", is what captures the intuitive meaning of the lambda calculus.

The trivial rule for equivalence is that two lambda expressions are equivalent if they are exactly identical. But that does help us make any progress, does it? So we invent a new concept of "equivalence". And say that we will consider two lambda expressions equivalent if one can be "converted" into another (or both can be converted into the a third same expression, more about this later).

Okay, so there are three conversions - alpha, beta and ita (that n-like symbol - remember the one for efficiency?).

Today, we'll keep it short andsimple and deal only with the first alpha. Actually we already know that, but lets give it a name and official recognition.

Alpha conversion rule is (\x.M) can be alpha converted to (\y.(M[y/x]). Which means, that the exact name of the bound variable does not matter. So (\x.x) and (\y.y) and (\z.z) are all "equivalent" functions.

Thats it. I stop here. This has been simple and will give you the breather you need so much.

Note that we have understood what "conversions" are, and that is good. We won't just learn three definitions and wonder what they are all about. In fact, if you can come out with a completely new conversion, and say lambda expressions are equivalent under this conversion, you'll have a completely new theory. But the essence here is, does the conversion capture some intuitive notion? Is it useful? Does accepting this extra rule of equivalence lead to more interesting results?

Table of contents

The Beta Conversion (now thats a real conversion!)

Now, we come to beta conversion. Its time to roll your sleeves up!

Beta conversion, captures our intuitive notion of functions and function application, that we have been sick of secretly harbouring in our minds. Out with it!

Definition - Beta Conversion: ((\x.M) N) on beta conversion leads to M [N/x]

This is function application! Let L be (\x.g x). So, (L a) is (g a).
This is like, Let L(x) = g(x). So, L(a) = g(a) - function application!

Now we know why (\x.x) is an identity function. ((\x.x) a) is a, ((\x.x) b) is b, ((\x.x)(\x.x)) is (\x.x). When I say "is", I assume that lambda expressions are equivalent after beta conversion. So whatever you give to an identity function, it returns itself. Also, the parameter can be any lambda expression! Since in lambda calculus, everything is a function (may be with 0 parameters), higher order functions and all that, is no big deal!

Now assume that you are using lambda calculus as a programming language, what would you want the interpreter to do? You'd want it to keep doing beta conversions until it can't do it anymore. So, hope I give you an idea of why beta conversion is the "real thing". But wait! This is all there is to the interpreter? Will it ever reach a state where no more beta reductions are possible? Well, now comes the scary part - non-termination!

Consider the lambda expression, our friend identity function's big brother -

B = (\x.x x)
So, B a = a a. B f = f f.

We can define such a B, since lambda calculus is untyped - no type restrictions! So B represents a function, that takes a parameter and applies the parameter to the parameter itself. Now, what happens if B is applied to itself? It'll apply itself to itself. But thats where we started!

(\x.x x) (\x. x x) on beta conversion gives (\x. x x) (\x.x x) - same thing!

So our interpreter will do the same thing again and again ad infinitum, and no progress.

Now, B can cause infinity in time, but there's more. Lets define a new one - Big B ;o)

Big B is (\x.x x x) - ok, you get the idea, but I'll write it anyways. Now apply Big B to himself. (\x.x x x)(\x.x x x) -> (\x.x x x) (\x.x x x) (\x.x x x) -> (\x.x x x)...9 times!

Boy, there goes space too!

That gentlemen, is beta conversion for you. People also call it beta reduction - but it ain't always "reduction", is it?

Table of contents

The Ita Conversion

This is the last of the conversions, and its no big deal. In fact, its not even mentioned often. BTW, ita is the n like symbol - the one we use for efficiency.

I'll go right ahead and define it.

Definition: ita conversion, (\x.M x) on ita conversion gives M provided M does not contain x as a free variable.

That means if M does not depend on x, (\x.M x) and M "mean" the same. That is both of these, when applied to N will give the same expression on beta reduction. (Easy to check).

This conversion embodies a principle called extensiality. That is, if two functions give the same expression when applied to the same argument, the two functions are equivalent. But as the paper I am referring to says, "In some situations, this principle (and ita-conversions) are dispensed with." Actually, I don't see why this is so.

My guess is that this prnciple is okay for simple lambda calculus. But once we allow primitive operations like numbers and arithmetic, it becomes difficult (and impossible, in the general case) to check whether two functions, given same argument give the same result.

Another issue maybe non-termination. If two functions lead to non-termination, or infinite expressions, should they be considered equivalent, or does the equivalence depend on the specific expression involved in non-termination. I am not sure whether, such conclusions regarding equivalence can be drawn, when infinities come into the picture. Perhaps that is why, the principle of extensialty is "dispensed with". I'll elaborate bit on this. We know for example, that

(\x.x x x) (\x.x x x) /* our friend Big B */

leads to infinities and non-termination. Now the, above expression is a concise and finite way of writing an infinite expression, which we will obtain by successive beta conversions. But given an arbitrary infinite expression, do we know how to write a concise and finite description like we do for BigB, or in fact, do we know that such a finite description even exists? Actually, I do not know the answer to this, but I think this gives an idea of why people chose to dispense with the principle of extensiality - its too much trouble.

This is the last conversion. Next time, we move on, armed with all the machinery there is, to play with lambda calculus, and look at the insights that we can obtain, by tinkering around with lambda expressions using these tools.

Table of contents

Normal Forms

We said that a lambda calculus interpreter will keep making beta conversions, until it can do it any more. A bit more about that today. Beta conversions and ita conversions are also called "reductions", because in a sense the simplify the expression. An alpha-conversion is not a reduction because it does not cause any change in form.

If no more reductions can be applied to a lambda expression, its said to be in "normal form". For examlpe, (\x.x) is in normal form, because no reductions are applicable. An expression is said to "have" a normal form, if after applying reductions, it can be brought to a normal form (that is you reach a point where you cannot apply any more reductions). So, ((\x.x) y) has a normal form, which is y. But we have seen that our friend B ((\x.x x) (\x.x x)) has no normal form, nor does BigB. BTW, this B is denoted by omega -the same symbol for ohms.

Table of contents

Currying (and some Bracketing Conventions)

All lambda functions seem to be only of one parameter. What do we do when we want a multiparameter function? Aswer - Use that beautiful thing called "currying", so called after Haskell Curry.

So if you want a two parameter function f(x, y) = L, you can write it as, \x.(\y.L)

L will have x and y as free variables. Now if the entire expression (\x.(\y.L)) is applied to a parameter, say N, we get another function that needs one parameter. When that is applied too, you get f(x, y). Currying is very popular in the functional programming world, because of the convenience it offers. But its not just that, the concept that a multiparameter function on supplying some of the parameters, gives a function that takes the rest of the parameters, is also a very useful way of thinking, and useful in designing programs.

I wish I could think of some nice and simple examples that will show you how cool currying is, but I can't think of any thing suitable. Anyways, I'll refer you to a paper. It could have been better, but its worth a read nevertheless. Its called "Fields in Physics are like Curried Functions Or Physics for Functional Programmers" by Gary T. Leavens. I take the liberty of attaching it along.(You can come to me for a printout).

Okay thats about currying. Btw, its just named after Haskell Curry - he didn't think of it first.

Before I go, I'll just give the bracketing conventions.

A two parameter function (\x.(\y.M)) can be written as (\x y. M)
In general, (\x1.(\x2....(\xn.M)...)) maybe written as \x1 x2 ...xn.M

Also, \x.\y.M N means \x.(\y.M N) and not \x.(\y.M) N

Therefore, \x.\y.M N does not admit any beta reductions, but \x.(\y.M) N does. That is, y can be substituted by N wherever appropriate.

Hope the bracketing conventions are clear.

To summarise, \x.M N abbreviates \x.(M N) rather than (\x.M) N. Also, x y z abbreviate (x y) z rather than x (y z).

Before, we take leave, here's how an application of a curried function will look like:

(\x y. L) M N on beta gives (\y.L [M/x]) N which in turn on beta gives L [M/x] [N/y] - just as desired.

Table of contents

Looking Back

Here's a great summary by Sachin, this is just to give make it part of the "official" lambda series.

In fact, such summaries are very useful, to get a overall view of what has happened so far, sort of a panoramic view of the whole lambda territory that we have so far been through.

Also, I must add, that this is a great stage in our studies of lambda calculus, to summarize - we have finished what can be called the "definition" of lambda calculus, and we will be moving on to its various insightful applications. So this just fits in perfectly.

Thanks a lot for the contribution.

Sachin's summary follows:

Lambda Calculus:

History

Alonzo Church (machinery to formalize set theory)
Computer Science and lambda calculus => functions in lambda are functions that are computable.

Turing
Russel's paradox
barber ?
inconsistent

Structural definition of lambda expressions: x, \x.y, .... where's the meaning part ?,
meaning lies in the conversions and relations between expressions modelling functions in mathematics

lambda abstraction: \x.M equivalent to f(x) = M
function application: M N denotes M=function and N=argument

substitution:
M[N/x] substitute N for x in M
variable capture: changes the form to something not acceptable free and bound variables

For M[N/x] below are the states of variables common to both M and N

free(M) = free(N) then no problem
free(M) = bound(N) then no problem (not much change)
bound(M) = free(N) then problem (some free variables will now be bonded)
bound(M) = bound(N) then no problems (boundness in N will be local)

Conversions
alpha
\x.M can be converted to \y.(M[y/x]
converting say f(x) to f(y)

beta
function application (we seen the glimpse of that)
((\x.M)N) on beta conversion gives M[N/x]
helpful, go on doing beta conversions until real thing comes,
normalization
problem sometimes no-termination, omega

ita
(\x.M x) on ita conversion gives M provided x is not a free variable in M
extensibility
equivalent functions
what about non-terminable functions, always equivalent or if same upto some level

Currying

bracketing conventions
x y z taken as (x y) z
(\x.(\y.M)) written as (\x y .M)
example of beta conversions step by step
(\x y.L) M N
(\x. (\y.L)) M N
(\y.L)[M/x] N
(L)[M/x][N/y]
L[M/x][N/y]

Table of contents

Encoding Data in Lambda Calculus (starting with Booleans)

Now that we have the entire lambda machinery in place, its time to move on and use it. The various notions in mathematics (though, now we'll stick to those useful in programming), must be encoded in lambda calculus. Things get interesting now! All we have is a definition and three conversions, and we intend to encode all that we know using these things!

Lets start with how to encode boolean values in lambda calculus. We choose the booleans because there's just two of them. Lets tackle this first and then move to infinite data types like natural numbers. So how do we go about encoding booleans? Well, lets first see what we know about booleans. We know that there are two of those - true and false. And they have some meaning (which is nice and secure in our minds). Then there are things like if, and, or, not that use these values (so in a sense, they impart meaning to the tokens "true" and "false").

Ok, a little philosophical chatter follows. Our mind learns incrementally from experience. Then it extrapolates and develops new ideas. By experience and thinking, we have evolved the notion of true and false. These notions are not formalized in the mind. They are some things that work in most often occuring cases - "This is an apple" is true or false, depending on "this". Our minds also evolve the concept of "if", "not", "and", "or" and all these things. But these notions, need not be consistent. Our minds can live with that. So when Socrates says "whatever Plato will say next will be true", and Plato immediately says "Socrates just lied", we are baffled. Most of the ideas we have, are those that work in the most often occuring cases. When we push them to the limits of their domain, we are baffled, and think of new ways to define the idea. Formalization is a similar effort. We want to define things completely - they may not match with what our minds have learnt by experience, in some cases, but we want them to be consistent. In fact, one of the big uses formalization is that we can check how much of a concept we really understand. In formalization, nothing is implicit or assumed - everything must be put on paper. The system is, all that which is on paper, and nothing but that. But the unfortunate side effect is that we end up with a whole lot of things on paper, and it is a big mess! So we try to keep the system small and the axioms few, and try to encode everything in that. Since we had a hundred things in mind, but we put only a few things on paper, and try to build everything from that, things start to become unintuitive. But formalists still prefer to keep the number of primitive symbols and operations to a bare minimum - they can develop whatever intuiton is required!

The point of all this discussion was to tell you that adding symbols called "true" and "false" to lambda calculus, is not what we aim for. If thats the path we choose, we can add symbols for 1, 2, 3, and we have natural numbers; add symbols for integration, summation and all that and we have the entire mathematics! But no! We have frozen our definition of lambda calculus here! Nothing more is to be added!

So how to encode true and false? true will be some lambda expression and so will false. Both will be functions. "if" should work with them. So we are looking for something that satisfies:

if true M N = M
if false M N = N

Note that, "and", "or", "not" and other logical operations can be defined in terms of "if".

There definitely be a lot of encodings that'll satisfy the above laws, but the simplest and the most commonly used are:

true = \x y. x /* a function that takes two arguments and always returns the first */

false = \x y. y /* a function that takes two arguments and always returns the first */

if = \c x y. c x y /* just apply the condition (c) to x and y, and you have it! (if c is true, it'll return the first arg: x, similarly false will work too) */

Now, "and p q" is just "if p then q else false"

and = \p q. p q (\x y. y)

("or", " not" and all that is exercise for the interested reader).

Well, now we have an idea of how things can be encoded in lambda calculus. The aim is to encode entire mathematics! Well, we have made a beginning (not that we ever are going to reach the end!).

Another interesting thing in the way we have encoded booleans is that the data seems to carry its control along with it. So the branching of control in if-statements, is modelled in the definition of the booleans itself. Why these things happens is that in lambda calculus everything is a function and we don't intend to have special symbols with special meanings. So the only way to encode a concept, is to capture its behaviour. So booleans carry their control with them - that is what makes them boolean values!

Table of contents

Encoding Natural Numbers in Lambda Calculus

After booleans, we proceed to see how to encode natural numbers. There are various encodings in use, but presented here is the encoding originally use by Church. It seems, other encodings are preferred today, one of you can take up that investigation (they pertain more to second order lambda calculus).

So, we want to encode 0, 1, 2, 3,....
This is how Church did it.
zero = \f x. x /* A function that takes two arguments and returns the second */

Incidentally, this is nothing but our friend "false", ("of course!" say the C programmers).

one = \f x. f x /* A function that takes two arguments, and applies the first to the second. */

two = \f x. f (f x) /* Two applications */

In general the number n has n applications. (The second argument is used to represent zero).
Its like the successor function used when defining natural numbers. 0 is a natural number.
if n is a natural number, successor(n) is a natural number.
So zero is 0, and 1 is successor(0), 2 is successor(successor(0)) and so on.

So the number of "f"s gives you the number being represented.

Get used to this encoding, we'll discuss how to write arithmetic operations, next.

Table of contents

Arithmetic on Natural Numbers in Lambda Calculus (Addition)

Now we know how we encode natural numbers:

0 is \f x.x /* apply f zero times */
1 is \f x. f x /* apply f once */
2 is \f x. f (f x) /* apply f twice */
3 is \f x. f (f (f x)) /* apply f thrice */

ok, you get the drift.

Now we have to work with these numbers. Lets start with addition. We have to write a function add, such that,
add m n = m + n

So we want the answer to be an expression, that takes an f, and an x, and applies f to x, (m+n) times.

So, it'll looke like

add m n = \f x. M, where M will be an expression in terms of m, n, f and x.
or rather, add = \m n f x. M

I'll guide you to the answer now. But you might want to think a bit about it first (do it). Actually, I am not sure whether to try and take you through a thought chain that'll lead you to the answer, or give you the answer and leave it to you to figure how we came there. I guess I'll lead you through it, but keep a free and open mind, there may be other ways to get there - don't let my path to the answer narrow your thinking down, to one "method".

Now, we know one is \f x. f x, two is \f x. f (f x)). Lets put our friend successor, in place of f, to make it easier to read.

So,
one is \x. succ x
two is \x. succ(succ x)
And so on. To make it even easier, lets remove the \x, and put an underscore in place of x, to show that its an argument.

So,

one is succ _
two is succ(succ _)
three is succ(succ(succ _))

Ok, think again? Are you getting the answer? Here's a hint - if you look at say, three, it "adds" three to the number which is in place of the underscore. So if you put 1 in that underscore, you get 3 + 1. If you put 2, you 3 + 2. If you put n, you get 3 + n. That means, once you have supplied the first argument (f) to a number (say m), you get a curried function (add m), that adds m to its argument. All you have to do, is supply the number to whom you want m to be added.

So,

if you have m, and supply the first argument f, just give second argument n, and you get m + n!

Now, have we got the answer? Well, I hope so! I'll write it down anyway,

add = \m n f x. (m f) (n f x) /* the final answer! */

As you see, (m f) is our curried function (add m), and it is applied to (n f x), that is, n with both its arguments supplied as required. With our bracketing conventions, we can write add as,
add = \m n f x. m f (n f x)

Now, lets check,
add = \m n f x.(m f) (n f x)
Given a "m" and a "n",
we get \f x. (m f) (n f x)
which expands to \f x. (m f) (ffff...n times x)
and then, \f x. (ffff...m times) (ffff...n times x)
which is \f x. (ffff...m+n times) x
which is nothing but (m + n)!

Well, hope I have given you a start, think about it a bit and you'll get the feel.

Well, next we have to tackle multiplication. It'll be something like, mul = \m n f x. M

Can you get the M? I'll wait a few days for some replies, before moving on.

Do try.

Discussions and Questions

Feedback

mul = \m n f x.(m (n f)) x
Given a "m" and a "n",
we get \f x. (m(ffff ... n times)) x
let fff...ntimes be called N
and then, \f x. (NNN.....m times) x
which is \f x. (ffff...n*m times) x

Feedback

Now you said that add = \m n f x.(m f) (n f x),
which means that f is applied to the right-most term m times. Which means that that the right-most term is incremented m times, since f is succ. The right-most term is of course, is f applied to x n times, which is the number n. Let's write add using succ instead of f:

add = \m n succ x.(m succ) (n succ x)

For multiplication, we want the add operation itself to be applied m times, so we would want to substitute addn for succ.

mul = \m n succ x.(m addn) (n succ x)

Reply

Oops! a bit off there. addn takes a number as a parameter, right? But if the above expression is expanded, the innermost thing will be, addn (n succ x) - but (n succ x) is not a natural number! Its is not of the for \f x.M! (n succ x) contains no lambdas! So, what we have here is a type error! So mul should be,

mul = \m n succ x. (m addn) zero which is, addn to zero m times.

Feedback continued

Where addn can be defined as:

addn = \n f x. n f x

Reply

Oops! This is wrong again! We want addn to add n to any number it gets, right? What you have written there, is the identity function for natural number n. i.e.

addn n = (\n f x. n f x) n = \f x. n f x = n

Actually, add n is a curried function of add, and in its body n should occur free.

So, addn = \p f x. (n f) (p f x) /* we are adding n to the parameter p */

Feedback continued

So, is this it?

mul= \m n f x. (m (\n f x. n f x))(n f x)

Reply

Nope. We have,

mul = \m n succ x.(m addn) zero

so mul = \m n.m (\p f x.(n f) (p f x)) (\f x.x)

Well, mul in this form seems a lot more complicated than what Sachin has sent. But they both are equivalent (I am not able to construct a proper proof now, will try harder when I am not so tired! But it doesn't seem so easy).

But the principle here, is important. "succ" in m's structure, is replaced by "addn", and you get multiplication by n! This is something like homomorphism. Keep the same structure, but change the trusses and the hinges, and you get something else! I was going to leave exponentiation m raised to n, as an exercise, but if you follow the same route, to get exponentiation, you put muln in place of succ! So, you exploit the fact that any natural number n, has n operations, and over base. For addition, you change the base to m. For multiplication, you change the operation (to addn). For exponentiation - you change the base (to one), and operation (to muln).

Feedback continued

Can this be written as:

mul = \m n f x. (m (n f x))(n f x) ??

Or even:

mul = \m n f x. (m n) (n f x) ??

Not sure if this is correct, but it seems so somewhat intuitively, since it's effectively obtained from add, except that we substitute n for f in the left term. So instead of "increment", we have "add n times" which is what multiplication is.

Reply

Well, there you go! Absolutely right.

Can anybody furnish a proof of how the two forms of mul are in fact equivalent? I'll try too.

Feedback continued

Haven't checked any text as yet. I thought I should really do some *work* here instead of simply referring to a book :)

Let me know if I am close...

Table of contents

Encoding Pairs in Lambda Calculus

Having finished seeing how to encode booleans, and natural numbers, we move on to see how to encode pairs of data. Its pretty simple.

pair = \x y f. f x y

So pair two seven = \f.f two seven
where two and seven are encoded using our encoding for our natural numbers.

Now, what are the operations on pairs:

  1. Building a pair.
  2. Extracting the first element.
  3. Extracting the second element.
For the first one, that is building a pair, we have our function pair.

For extracting the first element, we need a function that takes a pair, and returns the first element.

fst = \p. p true
where true is our old friend \x y. x

Its easy to see how this'll work. p is a pair. Look at the definition of pair. You see that p is something that takes a function (of two parameters) and applies it to the two elements of the pair. Now the function that we supply is true, which returns the first of its two arguments. So what we'll get is the first element of the pair.

Lets see how this works:
if we build that pair of two and seven, we have (\f. f two seven).
Now to extract the first of this:

fst (\f.two seven) = (\p. p true) (\f. two seven) /* by definition of fst */
= (\f. f two seven) true /* by beta reduction */
= true two seven /* by beta reduction */
= (\x y. x) two seven /* by definition of true */
= two /* by beta reduction */

Its easy to define the function to extract the second element of a pair.
snd = \p. p false

So we can now build pairs of data - note that its a compound data type. So we can have pairs of numbers, of booleans, one number on boolean, pair of pairs - anything. So we sure are making progress.

I'll leave you with an interesting exercise, thinking about which is useful, insightful and will help you undertand the material to come next, better. Write the factorial function - a function that'll take a natural number and return its factorial.

Table of contents

The Factorial Function

Ok, lets dedicate a whole part on writing the factorial function. I emphasize on this because factorial is a common example of recursive function. Recursive functions in lambda calculus, is a very interesting topic, and we'll be dealing with general recursion soon. Right now, lets warm up, by attempting to write the factorial function. Whenever, you feel that you've got a hint, stop reading, and try to figure it out yourself.

Here's what we know about the factorial function:

  1. The factorial function will take a natural number and return another natural number.
  2. The factorial of a number n, can be easily determined, if you are given two things - the factorial of its predecessor, and the number n itself. ("Two things", hmm, sounds like we can use our encoding of pairs).

Ok, think think think. How is a natural number represented? It is a function that takes two things - a function (f), and value(x), such that the function can operate upon this value. We want to use this representation, to get the factorial.

But what about recursion? We know that fact(n) = if n = 0 then 1 else n * fact(n-1)

How do we encode this recursion? Hey, but but a natural number is represented as f(f(f(f(x)))) - that is a kind of recursion, isn't it? We have to use it somehow. If only we could decide upon a "f" and a "x"!

What do we want f to do? We want the nth f, to use the value returned by the (n-1)th f, to give the factorial of n. What will an f require? It'll require the factorial of (n-1). Yes, but is that enough? No, it also needs a way to keep track of n. Bingo! Let f return both these things in a pair - a pair whose first component is n, and the second component is factorial(n-1). Cool! So we've decided on the type of x too! "x" gotta be a pair too!

What should x, be? When the first f is applied to x, we want it to give a pair (2, factorial(1)) so that it can be used by the second f. Right, but what should the first f, get? It should get (1, factorial(0)), which is nothing but (1, 1). Great! So we have fixed x! Now we can focus completely on f.

f must take a pair (n, factorial(n-1)) and give a pair (n+1, factorial(n)), which is nothing but (n+1, n*(factorial(n-1))).

So, f is nothing but something that takes a pair p, and returns another - lets write it.
f = \p. pair (succ (fst p)) (mul (fst p) (snd p)).

In case you are in doubt, "fst" and "snd" give the first and second elements of a pair. "mul" is multiplication, and "succ" gives the successor of a natural number.

Ok, lets write all these functions in lambda terms:

pair = \ f x y.f x y
fst = \p. p (\x y. x)
snd = \p. p (\x y. y)
mul = \m n.m (\p f x.(n f) (p f x)) (\f x.x)
succ = \n. (\f x. f (n f x))

So our
f = \p. pair (succ (fst p)) (mul (fst p) (snd p)) becomes,
f = \p. (\f x y. f x y) ((\n. (\f x. f (n f x))) ((\p. p (\x y. x)) p)) ((\m n.m (\p f x.(n f) (p f x)) (\f x.x)) ((\p. p (\x y. x)) p) ((\p. p (\x y. y)) p)

Don't try too hard to read this, I have just subsituted for pair, succ, fst, snd, and mul.

Ok, we have got f, but we don't want a pair! We want just the factorial.

Therefore,
factorial = (\n. snd (n f x))
where f = the f we have found above,
x = (1, 1) which is pair (\f x.f x) (\f x.f x) which is (\f x y. x y) (\f x. f x) (\f x. f x)

Phew! Lets write it all!

factorial = \n. (\p. p (\x y. x)) (n (\p. (\f x y. f x y) ((\n. (\f x. f (n f x))) ((\p. p (\x y. x)) p)) ((\m n.m (\p f x.(n f) (p f x)) (\f x.x)) ((\p. p (\x y. x)) p) ((\p. p (\x y. y)) p)) ((\f x y. x y) (\f x. f x) (\f x. f x)))

Done!

Of course, if we perform beta-reductions, the expressions might get smaller! But we aren't machines, are we?! If anybody can make like an automaton and do it - great! Otherwise, as long as you've understood it all, the goal's achieved.

Table of contents

Before we move on - Introspection

The purpose of this series has not been just to learn to work with lambda calculus, but understand the underlying principles and facts, which one wouldn't normally find in a text book kind of thing about lambda calculus. So in my opinion, we must feel free to take detours or plunge into great depths about something interesting that we find on the way. That is why I gave some history of set theory and lambda calculus, mentioned the Russell's paradox, and such like. The response to the lambda mails has suddenly gone down, so instead of ignoring it, lets see why that might have caused this. The primary difficulty has been our encoding of natural numbers and operations on them. We don't seem to have quite understood it. May be we got some answers, maybe we didn't, but all in all, we aren't so comfortable with it. There's also the fact that the factorial function that we've written, is actually really important, because thats an example about an important class of computable functions called the "primitive recursive functions", and we must understand that. Also, our encoding of natural numbers may seem quite arbitrary ("ok, we can write the successor (plus1) function easily, but how do we write predecessor? The encoding sucks!"). Well, the argument is right, but actually, it seems to be a standard way to do that, and lets see why. So we stop lambda calculus, per se, for sometime, and go to the mathematical theory of Recursive Functions. Of course, it is still very much related to lambda calculus, only written using the mathematical style (not the lambda calculus style).

So I hope, everybody attacks this with renewed vigour, and there are a lot more discussions, in the familiar territory where functions are written as "f(x)", and not "\x.M". but I must also caution you that it isn't easy. Well, if a thing ain't easy in the start. But that is exactly where such mail series helps! We can discuss and harass each other, until we make sure that all of us understands!

Ho to Recursive Function Theory tomorrow! I must also tell you that most of the material will be copied from an article (no I won't say which one yet!). But you'll get it in bits and pieces. And we should move ahead only after satisfying ourselves, that we have understood. Also, if I get no mails for a day or two after a posting, I assume that everybody has understood. If anybody wants me to wait, feel free to let me know. And if anybody wants to go back to something that he thinks that the others have understood, and its too far behind, still feel free to go back, lets exploit this mail series method to its fullest!

Table of contents

Starting Recursive Function Theory

Most of the material here, will be pinched from a page by Peter Suber that you can find here http://www.earlham.edu/~peters/courses/logsys/recursiv.htm I'll add my comments where I think they are required. Hope, this'll be a nice educational outing, out of the lambda calculus town, but not too far!

Recursive Function Theory...

Recursive function theory, like the theory of Turing machines, is one way to make formal and precise the intuitive, informal, and imprecise notion of an effective method. It happens to identify the very same class of functions as those that are Turing computable. This fact is informal or inductive support for Church's Thesis, asserting that every function that is effectively computable in the intuitive sense is computable in these formal ways.

[Note that recursive functions here means a particular class of functions, and not "functions who call themselves" kind. - Srineet.]

Like the theory of Turing machines, recursive function theory is vitally concerned with the converse of Church's Thesis: to prove that all the functions they identify as computable are effectively computable, that is, to make clear that they use only effective methods.

["Effective" here, seems to mean that it can be done in finite time. Not too sure though. - Srineet.]

Recursive function theory begins with some very elementary functions that are intuitively effective. Then it provides a few methods for building more complicated functions from simpler functions. The building operations preserve computability in a way that is both demonstrable and (one hopes) intuitive. The initial functions and the building operations are very few in number, and very simple in character, so that they are entirely open to inspection. It is easy to trust the foundation, and by its nature if you trust the foundation you trust the superstructure it permits you to build.

Because recursive function theory was developed in part to capture the intuitive sense of effectiveness in a rigorous, formal theory, it is important to the theory that the class of recursive functions can be built up from intuitively effective simple functions by intuitively effective techniques. Ironically, in my view, no author that I have seen expounds recursive function theory for beginners so that it really looks as intuitive as it intends to be. My intention here is to make the theory intuitively clear without sacrificing rigor.

Table of contents

Recursive Function Theory - the Initial Functions

The Initial Functions:

Logicians differ in their choice of the set of elementary functions that comprise the ultimate or primitive "ingredients" of all other functions. Picking them is comparable to picking axioms for a formal system that is to capture a predetermined body of mathematics. There is some slack, but not complete freedom.

If we continue the comparison to axiomatics, then we must revert to an ambition of axiomatics almost completely abandoned in the 20th century: the ambition to make the axioms "self-evident truths". The elementary functions must be intuitively effective or computable. If they are not, one will still get an interesting, perhaps an identical, set of resulting functions; but one will have abandoned part of the programmatic motive of the theory. (It would be much like developing a "modified" Turing machine that did the same work as an ordinary Turing machine but whose method of computation was so complex that it was mysterious.)

Our elementary functions are total, not partial. The reason should be clear. Only total functions fit the intuitive sense of effective method. A partial function is not defined for some arguments, and hence computes no result for those values. It is part of the definition of an effective method that it will always produce the correct answer.

Our elementary functions are all functions of the natural numbers. Hence, they may take zero as input, but not a negative number, and not any rational or irrational fraction.

[ Here, you can see that, this theory, in a way, includes the natural number theory. But when I say "natural number theory" it is just that wee little subset, where "zero is a natural, and if n is a natural number, so is succ(n)." The various arithmetic operations are not defined - so no division by zero and trouble! But in lambda calculus, this definition of natural numbers, is just a nencoding that we have made up, because we think it satisfies the notion of natural numbers, that we want it to satisfy. - Srineet.]

On most lists of "ingredient" functions are the zero function, the successor function, and the identity function.

z(x) = 0
s(x) = successor of x (roughly, "x + 1")
id(x) = x

The zero function returns zero regardless of its argument. It is hard to imagine a more intuitively computable function.

The successor function returns the successor of its argument. Since successorship is a more primitive notion than addition, the "x + 1" that I've used to elucidate the function is simply for the convenience of readers. The addition function does not yet exist and must be built up later.

If successorship is obscure or non-intuitive in the absence of addition, remember that we have allowed ourselves to use the system of natural numbers. We can't calculate yet, but we can count, and that is all we need here. (Hence, the core of recursive function theory that must be intuitive includes not only simple functions and their building operations, but also the natural numbers.)

The zero and successor functions take only one argument each. But the identity function is designed to take any number of arguments. When it takes one argument (as above) it returns its argument as its value. Again, its effectiveness is obvious. When it takes more than one argument, it returns one of them.

id2,1(x,y) = x
id2,2(x,y) = y

The two subscripts to the identity function indicate, first, the number of arguments it takes, and second, which argument will be returned by the function. (We could write the identity function more generally so that it applied to any number of arguments, but to preserve simplicity we will not.)

While others could be added, these three elementary functions suffice (with the techniques to be introduced) to build up all the functions we believe to be computable in the intuitive sense. Given the simplicity of these "ingredient" functions, this is a remarkable simplification of computation.

[So the are the basic functions, on which we plan to base the theory of recursive functions. - Srineet.]

Table of contents

Recursive Function Theory, Building Operations - Composition

Building Operations:

We will build more complex and interesting functions from the initial set by using only three methods. If we use the analogy to axiomatics again, the methods for building higher functions from our initial set correspond to rules of inference applied to axioms.

The three methods are composition, primitive recursion, and minimization.

Composition:

If we start with the successor function,

s(x)

then we may replace its argument, x, with a function. If we replace the argument, x, with the zero function, <>? z(x)
then the result is the successor of zero.

s(z(x)) = 1
s(s(z(x))) = 2 and so on.

In this way, compounding some of the initial functions can describe the natural numbers.

This building operation is called "composition" (sometimes "substitution"). It rests on the simple fact that computable functions have values, which are numbers. Hence, where we would write a number (numeral) we can instead write any function with its argument(s) that computes that number. In particular, we may use a function with its argument(s) in the place of an argument.

If we think of a function and its arguments as an alternate name of its value, then composition simply allows us to use this kind of name as the argument to a second function.

In the language used in predicate logic, a function with its arguments is a term, and may be used wherever terms are used in predicate logic well formed formulas. The arguments of functions are also terms. Hence, a function can have other functions as arguments.

We can define the function that adds 2 to a given number thus: add2(x) = s(s(x)). Introducing new names to abbreviate the compound functions created by composition is a great convenience since complex functions are made more intuitive by chunking their simpler components. But the essence of composition is not the re-naming or abbreviating of compound functions, but the compounding itself.

Obviously, the computability or effectiveness of composition is not affected by the number of arguments in any of the component functions, provided it is finite.

If a function is a rule (for picking a member of the range when given a member of the domain), then it must be computed to reach its result. In that sense we may consider functions to be programs or software, and their arguments input or data. From this perspective, the act of composition replaces data with a program that, when run, gives as output the data one desired as input. Composition shows the interchangeability, under controlled circumstances, of software and data.

It should be clear that when composition is applied to computable functions, only computable functions will result. Even if this is obvious, it may be well to aid intuition with reason. Composition yields computable functions when applied to computable functions because it does nothing more than ask us to compute more than one function and to do so in a certain order (working from the inside of the compound expression to the outside). If the ingredient functions are computable, then the task of computing any finite number of them in sequence will be computable. If the number of nested functions is finite (if the length of wffs is finite), then the effectiveness of the overall computation is not diminished.

Table of contents

Recursive Function Theory, Building Operations - Primitive Recursion

Primitive Recursion:

The second building operation is called primitive recursion. To those not used to it, it can be far from intuitive; to those who have practiced with it, it is fundamental and elegant.

Function h is defined through functions f and g by primitive recursion when

h(x,0) = f(x)
h(x,s(y)) = g(x,h(x,y))

Let's unpack this slowly. First, remember that f and g are known computable functions. Primitive recursion is a method of defining a new function, h, through old functions, f and g. Second, notice that there are two equations. When h's second argument is zero, the first equation applies; when it is not zero, we use the second. Notice how the successor function enforces the condition that the argument be greater than zero in the second equation. Hence, the first equation applies in the "minimal case" and the second applies in every other case. (It is not unusual for a function to be defined by a series of equations, rather than just one, in order to show its value for relevantly different inputs or arguments.)

So when (1) the second argument of h is zero, the function is equivalent to some known function f and we compute it; otherwise (2) it is equivalent to some known function g and we compute it. That's how we know that function h, or more generally, the functions built by primitive recursion, will be computable.

It is the second case that is interesting. We already know how to compute function g, but look at its two arguments in this case. Instead of taking two numbers, function g is here a compound function and takes one number x and another function. Its second argument is expressed by function h again! This circle is the essence of primitive recursion. To calculate function h when neither argument is zero we must calculate function g, which requires us to calculate function h.

This would be impossible if the function took the same arguments in both instances. It would be like saying to someone who could not multiply 10 by 20, "Nothing could be simpler; one only has to multiply 10 by 20!" Fortunately, that is not the case here. Compare the arguments for function h that we see on the left and right sides of the second equation. To calculate h when its arguments are (x,s(y)) we must calculate h when its arguments are (x,y). So the function is calling itself with different arguments. Note the second argument in particular. It has decremented from the sucessor of y (= y+1) to y itself. This means that to calculate h, we must decrement one of its arguments by one and calculate that. So the function does not futilely call itself; it calls a variation of itself, which makes all the difference.

(We could write the equations for primitive recursion more generally to show how a function of any number of terms calls a variation of itself, but again for the sake of simplicity we will not.)

But how do we do calculate the decremented variation of the original? Look at the equations. To calculate h when its second argument is n, we calculate h when its second argument is n-1; and to do that we calculate h when its second argument is n-2; and so on; that is the procedure required by the two equations. But eventually by this means the second argument will equal zero, in which case we use the first equation, and calculate function f, and are done.

Let's look at an example. To calculate 5! (5 "factorial"), we multiply 5x4x3x2x1. How would we define the general factorial function recursively? To say that it is n(n-1)(n-2)...(n-(n-1)) would be correct but not recursive. In that formulation, the function never calls itself and it contains the mysterious ellipsis (three dots). Recursive functions not only have the peculiarity of calling themselves, but they eliminate the need to use the ellipsis. This is a considerable improvement in clarity, rigor, completeness of specification, and intuitive effectiveness, even if it is still "weird" from another standpoint.

So, the recursive definition of the factorial function would be constructed like this. Initially, we note that 1! = 1. Now, to compute n!, we do not multiply n by (n-1), for that would commit us to the non-recursive series we saw above. Instead we multiply n by (n-1)!. And that's it; there is no further series. But to calculate (n-1)! we refer to our equations; if we have not reached 1 yet, then we multiply n-1 by (n-2)!, and so on until we do reach 1, whose factorial value is already defined. So the general formula would be expressed by these two equations:

1! = 1
n! = n(n-1)!

If we had already bothered to build up the functions for multiplication and subtraction from our initial functions and the building operations, then we could show more particularly how primitive recursion works as a building operation. The factorial function is derived by primitive recursion from the functions for multiplication and subtraction. A stricter definition of the factorial function, f(n), then, consists of these two equations:

f(n)
f(n) = 1 when n = 1
f(n) = n(f(n-1)) when n > 1

The first equation defines the condition under which the self-calling circular process "bottoms out". Without it we would have an infinite loop which never returns an answer.

This kind of self-calling would clearly work just as well if the "bottom" condition were greater than n, and we incremented instead of decremented until we reached it.

Primitive recursion is like mathematical induction. The first equation defines the basis, and the second defines the induction step.

Note that not all "recursive functions" use the building operation called "recursion". (Worse, not all "primitive recursive functions" use "primitive recursion".) They are called "recursive functions" because they are recursively defined, just as a formal system typically defines its wffs recursively, enabling it to deal with an infinity of instances through an effective method for deciding whether arbitrary candidates are cases. The use of the terms "recursive" and "primitive recursive" for components for the overall theory, and again for the the overall theory, is confusing and regrettable but that's the way the terminology has evolved.

Table of contents

Recursive Function Theory, Building Operations - More on Primitive Recursion

Here's some stuff from me about primitive recursion.

We saw the function

h(x, 0) = f(x)
h(x, s(y)) = g(x, h(x, y))

See h(x, 2), it can be drawn as an expanding tree:

h
x   2

on expansion

  g
x     h
  x       1

on further expansion

    g
  x     g
      x       h
          x         0

finally...

  g
x     g
  x       f
          x

f and g are known functions, and h - the unknown in the definition, disappears! In fact, all primitive recursive functions on natural numbers will have exactly the same structures. Only the fs and the gs will change, and the type of x will be the type expected by f and g. We can call primitive recursion as "structural recursion" because the overall structure or the framework is fixed - zero is the base case and then we can build over it. In fact, the very "data structure" here, which is natural numbers, is defined that way. We can exploit this structure, and work on any kind of object (the type of x has no restriction), we use the second parameter to obtain the desire "structure" in the recursion.

What other kind of recursion can there be? Well the most general equation can be something like

f x = if p then c else f (g x)

where p is some arbitrary predicate (a random oscillator!), and hence nothing can be said about such functions.

Now we know everybody loves primitive recursion, we know it'll terminate, and it also is so useful!

We must also understand that this structural recursion is not fixed to this particular structure (or to natural numbers- which is the same thing!).

Consider binary trees instead of natural numbers.
A tree is

  1. a leaf.
  2. some node with its two subtrees.
Lets define a primitive recursive function on this...

h(x, leaf) = f(x)
h(x, node leftChild rightChild) = g(x, b(h(x, leftChild), h(x, rightSubChild))

There are three known functions here :

  1. f that acts on the base case (leaf) as before
  2. g that works for the other case, but note that its second parameter is not a tree, but has the type of b.
  3. b to combine the hs from the two children.
There you go...here's a different "structure" for recursion which is also guaranteed to terminate. In fact, for any acyclic recursive data structure (yes, I am making up this term - hope you get the idea!), a kind of primitive recursion can be defined. I think this is what is called a "catamorphism" - but not at all sure!

Table of contents

Recursive Function Theory, Building Operations - Minimization

[This looks like a scary long article, but it makes prtetty easy reading, and can be read pretty fast - so go ahead and finish it off! - Srineet.]

Minimization:

Let us take a function f(x). Suppose there is at least one value of x which makes f(x) = 0. Now suppose that we wanted to find the least value of x which made f(x) = 0. There is an obviously effective method for doing so. We know that x is a natural number. So we set x = 0, and then compute f(x); if we get 0 we stop, having found the least x which makes f(x) = 0; but if not, we try the next natural number 1. We try 0,1,2,3... until we reach the first value which makes f(x) = 0. When we know that there is such a value, then we know that this method will terminate in a finite time with the correct answer. If we say that g(x) is a function that computes the least x such that f(x) = 0, then we know that g is computable. We will say that g is produced from f by minimization.

The only catch is that we don't always know that there is any x which makes f(x) = 0. Hence the project of testing 0,1,2,3... may never terminate. If we run the test anyway, that is called unbounded minimization. Obviously, when g is produced by unbounded minimization, it is not effectively computable.

If we don't know whether there is any x which makes f(x) = 0, then there is still a way to make function g effectively computable. We can start testing 0,1,2,3... but set an upper bound to our search. We search until we hit the upper bound and then stop. If we find a value before stopping which makes f(x) = 0, then g returns that value; if not, then g returns 0. This is called bounded minimization.

(We could define functions f and g more generally so that each took any number of arguments, but again for simplicity we will not.)

Of course, as a building operation we must build only with computable functions. So we build g by minimization only if f(x) is already known to be computable.

While unbounded minimization has the disadvantages of a partial function which may never terminate, bounded minimization has the disadvantage of sometimes failing to minimize.

If you know any programming language, we can make a helpful comparison. Recall the distinction between loops that iterate a definite number of times (e.g. Pascal FOR loops) and those that iterate indefinitely until some special condition is met (e.g. Pascal REPEAT and WHILE loops). Let loops of these two kinds be given the task of testing natural numbers in search of the least value of x in the function f(x) = 0. For each given candidate number, n, starting with 0 and moving upwards, the loop checks to see whether f(n) = 0. The loops which iterate a definite number of times will test only a definite, finite number of candidates. To run such a loop is to take so many steps toward the computation of function g(x). This is bounded minimization at work. Indefinitely iterating loops represent unbounded minimization: they will keep checking numbers until they find one which makes the value of function f equal zero, or until the computer dies of old age or we turn it off.

In general we will have no way of knowing in advance whether either kind of loop will ever discover what it is looking for. But we will always know in advance that a bounded search will come to an end anyway, while an unbounded search may or may not. That is the decisive difference for effective computability.

Suppose an unbounded search does terminate with a good value for y. Since the program did halt, it must have halted after a definite, finite number of steps. That means that we could have done the same work with a bounded search, if only we had known how large to make the upper bound. (In fact, we could have done the whole search without loops at all, if only we had known how many times to repeat the incrementing statements.) This fact is more important for theory than for practice. For theory it means that whatever unbounded minimization can actually compute can be computed by bounded minimization; and since the latter is effective, the ineffectiveness of the former is significantly mitigated. (We can increase the bound in bounded minimization to any finite number, in effect covering the ground effectively that unbounded minimization covers ineffectively.) For practical computation this means little, however. Since we rarely know where to set the upper bound, we will use unbounded searches even though that risks non-termination (infinite loops). Our job is not made any easier by knowing that if our unbounded search should ever terminate, then a bounded search could have done the same work without the risk of non-termination.

Table of contents

Recursive Function Theory, Building Operations - Summary

Summary

Some terms will help us ring a sub-total. If we allow ourselves to use composition, primitive recursion, and bounded minimization as our building operations, then the functions we can build are called primitive recursive (even if we did not use the building operation of primitive recursion). If we add unbounded minimization to our toolkit, we can build a larger class of functions that are called general recursive or just plain recursive. More precisely, functions are general recursive only when they use unbounded minimization that happens to terminate. Functions using unbounded minimization that does not terminate are called partial recursive because they are partial functions.

Church's Thesis refers to general recursive functions, not to primitive recursive functions. There are effectively computable functions that are not primitive recursive, but that are general recursive (e.g. Ackermann's function). But the open challenge latent in Church's Thesis has elicited no case of an effectively computable function that is not general recursive (if general recursive subsumes and includes the primitive recursive).

Partial recursive functions
General recursive functions
Primitive recursive functions

The set of general recursive functions is demonstrably the same as the set of Turing computable functions. Church's Thesis asserts indemonstrably that it is the same as the set of intuitively computable functions. In Douglas Hofstadter's terms, the partial recursive functions are the BlooP computable functions, the general recursive functions are the terminating FlooP computable functions. In Pascal terms, the primitive recursive functions are the FOR loop computable functions, while the general recursive functions the terminating WHILE and REPEAT loop computable functions. The partial recursive functions that are not also general or primitive recursive are the WHILE and REPEAT (FlooP) functions that loop forever without terminating.

Table of contents

Recursive Function Theory, Building Operations - Another Summary

Recursive Funtion Theory identifies class of functions that are Turing Computable(similar to Lamda Calculus) Class of recursive functions built from simple intuitively effective(finite time for to reach to a solution) functions using simple techniques.

There are three Elementary functions to build up all functions that are computable and effective:

Operations to build these computable and effective larger functions:
  1. Composition:

    -argument replaced by function
    -In some cases composition shows interchangability of software and data.

  2. Primitive Recursion:

    -a function is defined using primary recursion as follows
    h(x,0) = f(x)
    h(x,s(y)) = g(x,h(x,y))
    where f and g a known computabble functions.

    - in the 2nd equation, the succ(y) enforces that the second argument is > zero.
    In minimal case first equation is used and in all other cases the 2nd equation is used.
    Its notion is similar to that of mathematical induction.

    - function can also be defined by a series of such equations

    For eq: Factorial function can be written as

    f(n) = 1 when n = 1
    f(n) = n(f(n-1)) when n > 1

    - Structural Recursion used for other structures too apart from Natural Numbers only.
    eq Binary trees

  3. Minimization:

    g(x) which computes for least x such that f(x) = 0 is said be obtained from f by minimzation.

    Unbounded Minimization:
    If we donot know that there is a x which makes f(x) = 0 and still runn the test then this minmization is called as unbounded minimization. g thus obtained will not be effectively computable.

    We can make g in such cases effectively computable by keeping an upper bound on the value of x. This is called bounded minimization.

    Disadvantages:
    unbounded minimization : partial function may never terminate.
    bounded minimization : sometimes failing to minimize

    Analogy:

    a *while* loop would be unbounded minimization and a *for* loop would be bounded minimization

    unbounded search may lead to termination after a finite steps. Hence it could have been calculated using bounded minimization.For theory it means the ineffectiveness of the unbounded is significantly mitigated.for practical computation not much of any help.

  4. Table of contents